<html>
<head>
<link rel="shortcut icon" href="./favicon.ico">
<link rel="stylesheet" type="text/css" href="./style.css">
<link rel="canonical" href="./reading.html">
<meta name="viewport" content="width=device-width, initial-scale=1">
<title>References and Reading List</title>
<meta name="description" content="Here are the sources I've used while writing the FPGA Design Elements, covering digital design, bit manipulation, elastic pipelines, and clock-domain crossing.">
</head>
<body>

<h1>References and Reading List</h1>

<p>Here are the sources I've used while writing the FPGA Design Elements,
covering digital design, bit manipulation, elastic pipelines, and clock-domain
crossing.

<a name="digital"</a> 
<h2>Digital Design</h2>

<ul>

<li><h3>Design Books</h3>

<ul>

<li>Most books about digital design repeat the same superficial basics, and are
usually out-of-print, out-of-date, and overpriced. The exception is <a
href="https://www.wakerly.org/DDPP/">Digital Design Principles and Practices</a>
by John F. Wakerly, which covers many topics effectively, and deals with
real-life details at all levels, from microelectronics to coding practices. It
also discusses number systems better than other textbooks.

<li>Another useful book is "Logical Effort: Designing Fast CMOS Circuits" by Ivan
Sutherland, Bob Sproull, and David Harris. It describes the relationships
between the source/load capacitances and drive strengths of CMOS circuits in a
simple and very structured way, allowing you to optimize and compare the delay
of different circuit implementations pretty much using pencil-and-paper level
mathematics. You can read a summary of Logical Effort Theory and an example of
its application in one of my old project reports: <a
href="https://fpgacpu.ca/writings/OptimizationOfBurstModeCircuitsUsingLogicalEffortTheory.pdf">Optimization
Of Burst-Mode Circuits Using Logical Effort Theory</a>.

<li>The original VLSI book, "Introduction To VLSI Systems", by Carver Mead and
Lynn Conway, is still worth looking through even though it describes obsolete
NMOS technology with lamba-based design rules, because of its illuminating
perspective (and beautiful full-color circuit layouts). It's the book that made
me understand that capacitance is <b>necessary</b> for digital circuits to
operate at all! The chapter on self-timed circuits is still one of the
fundamental texts on asynchronous circuits.

</ul>

<li><h3>Finite State Machines</h3>

<p>Two books which I've found talk usefully about FSM design and implementation are:
<ul>
<li>"FSM-based Digital Design using the Verilog HDL" by Elliott Mins
<li>"Finite State Machine Datapath Design, Optimization, and Implementation" by Reese Davis 
</ul>

<li><h3>Arbiters</h3>

<p>There isn't much written about arbiters out there that isn't specializations
of existing designs for particular uses. I couldn't find any good treatment of
the fundamentals, and some ASIC arbiter designs rely on combinational loops,
which are not allowed in conventional synchronous design. Here's what I could
find:

<ul>
<li>"Principles and Practices of Interconnection Networks" by Dally and Towles, specifically Chapter 18, "Arbitration". <em>This is also the major text on Network-on-Chip design.</em>
<li>"Microarchitecture of Network-on-Chip Routers" by Dimitrakopoulos, Psarras, and Seitanidis, specifically Chapter 4, "Arbitration Logic".
<li>"Scalable arbiters and multiplexers for on-FPGA interconnection networks" by Dimitrakopoulos, Psarras, and Seitanidis, published at FPL 2011.
<li>"Arbiters: Design Ideas and Coding Styles" by Matt Weber, published at SNUG Boston 2001. <em>This is the most accessible source material for priority and round-robin arbiter design.</em>
<li>The <a href="./Bitmask_Isolate_Rightmost_1_Bit.html">Bitmask: Isolate Rightmost 1 Bit</a> module efficiently implements a <a href="./Arbiter_Priority.html">Priority Arbiter</a> using adder carry-chain logic.

</ul>

<li><h3>Resets</h3>Ken Chapman's <a
href="https://www.xilinx.com/support/documentation/white_papers/wp272.pdf">Get
Smart About Reset: Think Local, Not Global</a> discusses why, on FPGAs, it's
important to minimize the number of registers receiving a reset signal, and to
instead use the built-in register initialization at configuration or to
architect the system to eventually get to a consistent state after a minimal
reset.

<a name="flancter"</a> 
<li><h3>Flancter</h3>Rob Weinstein's <a href="./Flancter_App_Note.pdf">Flancter</a> is a clever
little circuit which allows you to set a bit in one clock domain, and clear it
from another clock domain <em>without using an asynchronous reset or even
having access to the setting clock domain</em>. It needs supporting circuitry
to be useful (e.g.: at least one <a href="./CDC_Bit_Synchronizer.html">CDC Bit
Synchronizer</a>), and you will have to <a
href="https://forums.xilinx.com/t5/Implementation/How-to-constraint-a-flancter/td-p/672533">carefully
instruct your CAD tool</a> on how to analyze the timing and set min/max delays
on the two signals which cross clock domains without synchronization, and
design your control circuitry to keep the set/reset signals mutually exclusive.
That said, it's amazing for some advanced use cases, such as <a
href="./Flancter_fastevent_counter.pdf"> counting very fast and asynchronous
sensor pulses</a>. However, in most cases, you can get the same functionality
with simpler CDC behaviour by using a <a href="./CDC_Flag_Bit.html">CDC Flag
Bit</a>.

<p>(Thanks to Eric Smith (<a
href="https://www.twitter.com/brouhaha">@brouhaha</a>) for allowing me to host a
copy of the hard-to-find original Flancter App Note, originally from his <a
href="https://www.floobydust.com/">Floobydust</a> page.)


</ul>

<a name="bit"</a> 
<h2>Bit Manipulation</h2>

<p>Bit manipulation algorithms usually contain only Boolean, bit-shift, and
addition operations, are often branch-free, and operate in parallel on all the
bits in a word. Thus, they translate naturally to high-performance hardware.
Note that many bit manipulation algorithms assume a certain word width, which
affects the necessary constants.

<ul>

<li> <a name="Warren2013"></a><a
href="https://web.archive.org/web/20190916060535/https://hackersdelight.org/">Hacker's
Delight</a>, 2nd ed., 2013, Henry S. Warren, Jr. This book covers algorithms
related to bit manipulation, arithmetic, branch-free code, and other deep
topics. A must-have for hardware designers. The website, which has extra
material and errata, has since gone down, so this is an Archive.org link.

<li> <a href="https://catonmat.net/low-level-bit-hacks">Introduction to Low
Level Bit Hacks</a> provides a detailed explanation of the fundamental bit
manipulations from which most others are built. (See: Hacker's Delight, Chapter
2, "Basics").

<li> <a href="https://graphics.stanford.edu/~seander/bithacks.html">Bit
Twiddling Hacks</a> is another repository of bit manipulation algorithms,
complementing Hacker's Delight.

<li> <a href="https://aggregate.org/MAGIC/">The Aggregate Magic Algorithms</a>
has a few different algorithms, and notably, SWAR (SIMD Within A Register).

<li> <a href=""https://bits.stephan-brumme.com>the bit twiddler</a> lists some
C code, with cycle counts, explanations, and assembly output.

<li> <a href="https://programming.sirrida.de/">Programming pages</a> of Jasper
Neumann. Detailed explanations with beautiful diagrams.

<li> <a href=""https://github.com/keon/awesome-bits>awesome-bits</a> by Keon
Kim. Some nifty string and float algorithms here.

<li> The defunct (thus the archive.org links) <a
href="https://web.archive.org/web/20180714171943/https://chessprogramming.wikispaces.com/">Chess
Programming Wiki</a> describes 64-bit <a
href="https://web.archive.org/web/20180715002049/https://chessprogramming.wikispaces.com/Bitboards">Bitboards</a>,
used to represent chess boards, which lend themselves to a rich discussion of
various bit algorithms and their implementations:

    <ul>

    <li> <a href="https://web.archive.org/web/20180715002044/https://chessprogramming.wikispaces.com/General%20Setwise%20Operations">General Setwise Operations</a>
    <li> <a href="https://web.archive.org/web/20180715003918/https://chessprogramming.wikispaces.com/Bit-Twiddling">Bit Twiddling</a>
    <li> <a href="https://web.archive.org/web/20180713092652/https://chessprogramming.wikispaces.com/population%20count">Population count</a>
    <li> <a href="https://web.archive.org/web/20180715002053/https://chessprogramming.wikispaces.com/BitScan">BitScan</a>
    <li> <a href="https://web.archive.org/web/20180716150755/https://chessprogramming.wikispaces.com/Flipping%20Mirroring%20and%20Rotating">Flipping Mirroring and Rotating</a>

    </ul>

</ul>

<a name="elastic"</a> 
<h2>Elastic Pipelines</h2>

<p>The following papers are mainly useful for background and for implementation
guides to pipeline branches, joins, forks, etc... Note however that in the
academic literature, elastic pipelines are often expressed as valid/stop
handshakes, rather than valid/ready handshakes, where "stop" is the inverse of
"ready".  This means parts of the diagrams have inverted logic from what it
would be in valid/ready handshakes. Also, watch out for combinational paths,
which aren't completely avoided in these designs: use <a
href="./Pipeline_Skid_Buffer.html">Skid Buffers</a> as necessary.

<ul>

<li>R. T. Possignolo, E. Ebrahimi, H. Skinner and J. Renau, <a
href="https://masc.soe.ucsc.edu/docs/iccd16.pdf">Fluid Pipelines: Elastic
Circuitry meets Out-of-Order Execution</a>, 2016 IEEE 34th International
Conference on Computer Design (ICCD), Scottsdale, AZ, 2016, pp. 233-240, doi:
10.1109/ICCD.2016.7753285

<li>J. Cortadella, M. Kishinevsky, and B. Grundmann, <a
href="https://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.99.9778">SELF:
Specification and design of synchronous elastic circuits</a>, TAU ’06:
Proceedings of the ACM/IEEE International Workshop on Timing Issues

<li>M. Abbas and V. Betz, <a
href="https://kalman.mee.tcd.ie/fpl2018/content/pdfs/FPL2018-43iDzVTplcpussvbfIaaHz/1PNSl54xKC7BAFw7YOeZRT/1sc2uahgEryvvJ7qQkPXkz.pdf">Latency
Insensitive Design Styles for FPGAs</a>, 2018 28th International Conference on
Field Programmable Logic and Applications (FPL), Dublin, 2018, pp. 360-3607,
doi: 10.1109/FPL.2018.00068

</ul>

<a name="cdc"</a> 
<h2>Clock Domain Crossing</h2>

<p>Frankly, most writing about CDC is superficial and subtly wrong. Here is some of the best writings on the topic.

<ul>

<li>Clifford E. Cummings, Don Mills, and Steve Golson, <a
href="https://www.sunburst-design.com/papers/CummingsSNUG2003Boston_Resets.pdf">Asynchronous
& Synchronous Reset Design Techniques - Part Deux</a>, SNUG 2003, Boston

<li>Clifford E. Cummings, <a
href="https://www.sunburst-design.com/papers/CummingsSNUG2008Boston_CDC.pdf">Clock
Domain Crossing (CDC) Design & Verification Techniques Using SystemVerilog</a>,
SNUG 2008, Boston

<li>Clifford E. Cummings, <a
href="https://www.sunburst-design.com/papers/CummingsSNUG2002SJ_FIFO1.pdf">Simulation
and Synthesis Techniques for Asynchronous FIFO Design</a>, SNUG 2002, San Jose

</ul>

<hr><a href="./index.html">Back to FPGA Design Elements</a>
</body>
</html>
